Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition
نویسندگان
چکیده
In this paper several improvements of our speech-to-speech translation system JANUS on spontaneous human-to-human dialogs are presented. Common phenomena in spontaneous speech are described, followed by a classi cation of di erent types of noises. To handle the variety of spontaneous e ects in human-to-human dialogs, special noise models are introduced representing both human and nonhuman noises, as well as word fragments. It will be shown that both the acoustic and the language modeling of these noises increase the recognition performance signi cantly. In the experiments, a clustering of the noise classes is performed and the resulting cluster variants are compared, thus allowing to determine the best tradeo between sensitivity and trainability of the models.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملRecent Progress in Corpus-Based Spontaneous Speech Recognition
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance f...
متن کامل